NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Temporal-coherence induces binding of responses to sound sequences in ferret auditory cortex

https://doi.org/10.1016/j.isci.2025.111991

Lu, Kai; Dutta, Kelsey; Mohammed, Ali; Elhilali, Mounya; Shamma, Shihab (March 2025, iScience)

Free, publicly-accessible full text available March 1, 2026
Sparse high-dimensional decomposition of non-primary auditory cortical receptive fields

https://doi.org/10.1371/journal.pcbi.1012721

Mukherjee, Shoutik; Babadi, Behtash; Shamma, Shihab (January 2025, PLOS Computational Biology)
Kumar, Arvind (Ed.)
Characterizing neuronal responses to natural stimuli remains a central goal in sensory neuroscience. In auditory cortical neurons, the stimulus selectivity of elicited spiking activity is summarized by a spectrotemporal receptive field (STRF) that relates neuronal responses to the stimulus spectrogram. Though effective in characterizing primary auditory cortical responses, STRFs of non-primary auditory neurons can be quite intricate, reflecting their mixed selectivity. The complexity of non-primary STRFs hence impedes understanding how acoustic stimulus representations are transformed along the auditory pathway. Here, we focus on the relationship between ferret primary auditory cortex (A1) and a secondary region, dorsal posterior ectosylvian gyrus (PEG). We propose estimating receptive fields in PEG with respect to a well-established high-dimensional computational model of primary-cortical stimulus representations. These “cortical receptive fields” (CortRF) are estimated greedily to identify the salient primary-cortical features modulating spiking responses and in turn related to corresponding spectrotemporal features. Hence, they provide biologically plausible hierarchical decompositions of STRFs in PEG. Such CortRF analysis was applied to PEG neuronal responses to speech and temporally orthogonal ripple combination (TORC) stimuli and, for comparison, to A1 neuronal responses. CortRFs of PEG neurons captured their selectivity to more complex spectrotemporal features than A1 neurons; moreover, CortRF models were more predictive of PEG (but not A1) responses to speech. Our results thus suggest that secondary-cortical stimulus representations can be computed as sparse combinations of primary-cortical features that facilitate encoding natural stimuli. Thus, by adding the primary-cortical representation, we can account for PEG single-unit responses to natural sounds better than bypassing it and considering as input the auditory spectrogram. These results confirm with explicit details the presumed hierarchical organization of the auditory cortex.
more » « less
Full Text Available
Decoding contextual influences on auditory perception from primary auditory cortex

https://doi.org/10.7554/eLife.94296.3

Englitz, Bernhard; Akram, Sahar; Elhilali, Mounya; Shamma, Shihab (December 2024, eLife)

Perception can be highly dependent on stimulus context, but whether and how sensory areas encode the context remains uncertain. We used an ambiguous auditory stimulus – a tritone pair – to investigate the neural activity associated with a preceding contextual stimulus that strongly influenced the tritone pair’s perception: either as an ascending or a descending step in pitch. We recorded single-unit responses from a population of auditory cortical cells in awake ferrets listening to the tritone pairs preceded by the contextual stimulus. We find that the responses adapt locally to the contextual stimulus, consistent with human MEG recordings from the auditory cortex under the same conditions. Decoding the population responses demonstrates that cells responding to pitch-changes are able to predict well the context-sensitive percept of the tritone pairs. Conversely, decoding the individual pitch representations and taking their distance in the circular Shepard tone space predicts theoppositeof the percept. The various percepts can be readily captured and explained by a neural model of cortical activity based on populations of adapting, pitch and pitch-direction cells, aligned with the neurophysiological responses. Together, these decoding and model results suggest that contextual influences on perception may well be already encoded at the level of the primary sensory cortices, reflecting basic neural response properties commonly found in these areas.
more » « less
Full Text Available
IDyOMpy: A new Python-based model for statistical analysis of musical expectations

https://doi.org/10.1016/j.jneumeth.2024.110347

Marion, Guilhem; Gao, Fei; Gold, Benjamin P; Di_Liberto, Giovanni M; Shamma, Shihab (December 2024, Journal of Neuroscience Methods)

Full Text Available
The social and neural bases of creative movement: workshop overview

https://doi.org/10.1186/s12868-024-00893-w

Shamma, Shihab; Contreras-Vidal, Jose; Fritz, Jonathan; Lim, Soo-Siang; Tuller, Betty; Edwards, Emmeline; Iyengar, Sunil (December 2024, BMC Neuroscience)

Full Text Available
Temporal coherence shapes cortical responses to speech mixtures in a ferret cocktail party

https://doi.org/10.1038/s42003-024-07096-3

Joshi, Neha; Ng, Wing Yiu; Thakkar, Karan; Duque, Daniel; Yin, Pingbo; Fritz, Jonathan; Elhilali, Mounya; Shamma, Shihab (December 2024, Communications Biology)

Full Text Available
Learning to Compute the Articulatory Representations of Speech with the MIRRORNET

https://doi.org/10.21437/Interspeech.2023-562

Siriwardena, Yashish M; Espy-Wilson, Carol; Shamma, Shihab (August 2023, Interspeech 2023)

Full Text Available
Investigating the cortical tracking of speech and music with sung speech

https://doi.org/10.21437/Interspeech.2023-1949

Cantisani, Giorgia; Chalehchaleh, Amirhossein; Di Liberto, Giovanni; Shamma, Shihab (August 2023, ISCA)

Full Text Available
The Mirrornet : Learning Audio Synthesizer Controls Inspired by Sensorimotor Interaction

https://doi.org/10.1109/ICASSP43922.2022.9747358

Siriwardena, Yashish M.; Marion, Guilhem; Shamma, Shihab (May 2022, International Conference on Acoustics, Speech and Signal Processing)

Experiments to understand the sensorimotor neural interactions in the human cortical speech system support the existence of a bidirectional flow of interactions between the auditory and motor regions. Their key function is to enable the brain to ‘learn’ how to control the vocal tract for speech production. This idea is the impetus for the recently proposed "MirrorNet", a constrained autoencoder architecture. In this paper, the MirrorNet is applied to learn, in an unsupervised manner, the controls of a specific audio synthesizer (DIVA) to produce melodies only from their auditory spectrograms. The results demonstrate how the MirrorNet discovers the synthesizer parameters to generate the melodies that closely resemble the original and those of unseen melodies, and even determine the best set parameters to approximate renditions of complex piano melodies generated by a different synthesizer. This generalizability of the MirrorNet illustrates its potential to discover from sensory data the controls of arbitrary motor-plants.
more » « less
Full Text Available
Harmonicity Plays a Critical Role in DNN Based Versus in Biologically-Inspired Monaural Speech Segregation Systems

https://doi.org/10.1109/ICASSP43922.2022.9747314

Parikh, Rahil; Kavalerov, Ilya; Espy-Wilson, Carol; Shamma, Shihab (May 2022, International Conference on Acoustics, Speech and Signal Processing)

Recent advancements in deep learning have led to drastic improvements in speech segregation models. Despite their success and growing applicability, few efforts have been made to analyze the underlying principles that these networks learn to perform segregation. Here we analyze the role of harmonicity on two state-of-the-art Deep Neural Networks (DNN)-based models- Conv-TasNet and DPT-Net [1],[2]. We evaluate their performance with mixtures of natural speech versus slightly manipulated inharmonic speech, where harmonics are slightly frequency jittered. We find that performance deteriorates significantly if one source is even slightly harmonically jittered, e.g., an imperceptible 3% harmonic jitter degrades performance of Conv-TasNet from 15.4 dB to 0.70 dB. Training the model on inharmonic speech does not remedy this sensitivity, instead resulting in worse performance on natural speech mixtures, making inharmonicity a powerful adversarial factor in DNN models. Furthermore, additional analyses reveal that DNN algorithms deviate markedly from biologically inspired algorithms [3] that rely primarily on timing cues and not harmonicity to segregate speech.
more » « less
Full Text Available

« Prev Next »

Search for: All records